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Abstract 

Power-law distributions contain precious information about a large variety of processes in geo- 
science and elsewhere. Although there are sound theoretical grounds for these distributions, the 
empirical evidence in favor of power laws has been traditionally weak. Recently, Clauset et al. 
have proposed a systematic method to find over which range (if any) a certain distribution behaves 
as a power law. However, their method has been found to fail, in the sense that true (simulated) 
power-law tails are not recognized as such in some instances, and then the power-law hypothesis is 
rejected. Moreover, the method does not work well when extended to power-law distributions with 
an upper truncation. We explain in detail a similar but alternative procedure, valid for truncated 
as well as for non-truncated power-law distributions, based in maximum likelihood estimation, 
the Kolmogorov-Smirnov goodness-of-fit test, and Monte Carlo simulations. An overview of the 
main concepts as well as a recipe for their practical implementation is provided. The performance 
of our method is put to test on several empirical data which were previously analyzed with less 
systematic approaches. The databases presented here include the half-lives of the radionuclides, 
the seismic moment of earthquakes in the whole world and in Southern California, a proxy for the 
energy dissipated by tropical cyclones elsewhere, the area burned by forest fires in Italy, and the 
waiting times calculated over different spatial subdivisions of Southern California. We find the 
functioning of the method very satisfactory. 

I. INTRODUCTION 

Over the last decades, the importance of power-law distributions has continuously increased, 
not only in geoscience but elsewhere These are probability distributions defined by a proba- 
bility density (for a continuous variable x) or by a probability mass function (for a discrete variable 
x) given by, 

/w - 

for jc > a and a > 0, with a normalization factor (hidden in the proportionality symbol oc) which de- 
pends on whether .jc is continuous or discrete. In any case, normalization implies a > 1. Sometimes 
power-law distributions are also called Pareto distributions CD O (or Riemann zeta distributions 
in the discrete case [3]), although in other contexts the name Pareto is associated to a slightly 
different distribution [IJ. So we stick to the clearer term power- law distribution. 
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These have remarkable, non-usual statistical properties, as are scale invariance and divergence 
of moments. The first one means that power-law functions (defined between and 0°) are invariant 
under (properly performed) linear rescaling of axes (both x and /) and therefore have no charac- 
teristic scale, and hence cannot be used to define a prototype of the observable represented by x 
[|4]-(3. For example, no unit of distance can be defined from the gravitational field of a point mass 
(a power law), whereas a time unit can be defined for radioactive decay (an exponential function). 
However, as power-law distributions cannot be defined for all jc > but for x > a > their scale 
invariance is not "complete" or strict. 

A second uncommon property is the non-existence of finite moments; for instance, if a < 2 not 
a single finite moment exists (no mean, no variance, etc.). This has important consequences, as 
the law of large numbers does not hold (H, p. 65, i.e., the mean of a sample does not converge as 
the size of the sample increases; alternatively, we can say that the sample mean tends to infinity, 
which is another way to understand the law of large numbers in this case [9], p. 393. If 2 < 
a < 3 the mean exists and is finite, but higher moments are infinite, which means for instance 
that the central limit theorem, in its classic formulation, does not apply (the mean of a sample 
is not normally distributed and has infinite standard deviation) [lOJ. Higher a's yield higher- 
order moments infinite, but then the situation is not so "critical". Newman reviews other peculiar 
properties of power-law distributions, such as the 80|20 rule [O. 

Although the normal (or Gaussian) distribution gives a non-zero probability that a human being 
is 10 m or 10 km tall, the definition of the probability density up to infinity is not traumatic at 
all, and the same happens with an exponential distribution and most "standard" distributions in 
probability theory. However, one already sees that the power-law distribution can be problematic, 
in particular for a < 2, as it predicts an infinite mean, and for 2 < a < 3, as the variability of 
the sample mean is infinite. Of course, there can be variables having an infinite mean (we can 
easily simulate in our computer processes in which the time between events has an infinite mean), 
but in other cases, for physical reasons, the mean should be finite. In such situations a simple 
generalization is the truncation of the tail [[II [HI [13, yielding the truncated power-law distribution, 
defined in the same way as before by f{x) oc 1 /x" but with a < x < b, with b finite, and with 
normalizing factor depending now on a and b (in some cases it is possible to have a = 0, see 
next section). Obviously, the existence of a finite upper cutoff b automatically leads to well- 
behaved moments, if the statistics is enough to "see" the cutoff; on the other hand, a range of scale 
invariance can persist, if b^ a. What one finds in some practical problems is that the statistics is 
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not enough to decide which is the sample mean and one cannot easily conclude if a pure power 
law or a truncated power law is the right model for the data. 

A well known example of (truncated or not) power-law distribution is the Gutenberg-Richter 
law for earthquake "size" [fTSj - fTSl . If by size we understand radiated energy, the Gutenberg-Richter 
law implies that, in any seismically active region of the world, the sizes of earthquakes follow a 
power-law distribution, with an exponent a = 1 -1-25/3 and B close to 1. In this case, scale 
invariance means that the simple question: how big (in terms of radiated energy) are earthquakes 
there(?), has no possible answer. The non-convergence of the mean energy can easily be checked 
from data: catastrophic events such as the Sumatra- Andaman mega-earthquake of 2004 contribute 
to the mean much more than the previous recorded history [T61. Note that for the most common 
formulation of the Gutenberg-Richter law, in terms of the magnitude, earthquakes are not power- 
law distributed, but this is due to the fact that magnitude is an (increasing) exponential function 
of radiated energy, and therefore magnitude turns out to be exponentially distributed. In terms of 
magnitude, the statistical properties of earthquakes are trivial (well behaved mean, existence of a 
characteristic magnitude...), but we insist that this is not the case in terms of radiated energy. 

Malamud [ fTTl lists several other natural hazards following power-law distributions in some 
(physical) measure of size, such as landslides, rockfalls, volcanic eruptions [fTSl [T9il . and forest 
fires [12011 . and we can add rainfall |i2Tl|22l, tropical cyclones (roughly speaking, hurricanes) ll23l . 
auroras [|24l . tsunamis [1251 . etc. In some cases this broad range of responses is triggered simply by 
a small driving or perturbation (the slow motion of tectonic plates for earthquakes, the continuous 
pumping of solar radiation in hurricanes, etc.); then, this highly nonlinear relation between input 
and output can be labeled as crackling noise [|26l. Notice that this does not apply for tsunamis, for 
instance, as they are not slowly driven (or at least not directly slowly driven). 

Aschwanden [27J reviews disparate astrophysical phenomena which are distributed according 
to power laws, some of them related to geoscience: sizes of asteroids, craters in the Moon, solar 
flares, and energy of cosmic rays. In the field of ecology and close areas, the applicability of 
power-law distributions has been overviewed by White et al., mentioning also island and lake sizes 
[|28l. Reference [ fTT| provides bibliography for power- law and other heavy-tailed distributions in 
diverse disciplines, including hydrology, and Ref. [fT2]| provides interesting geological examples. 

A theoretical framework for power-law distributed sizes (and durations) of catastrophic phe- 
nomena not only in geoscience but also in condensed matter physics, astrophysics, biological 
evolution, neuroscience, and even the economy, is provided by the concept of self-organized crit- 
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icality, and summarized by the sandpile paradigm IH |29] - l32l . However, although the ideas of 
self-organization and criticality are very reasonable in the context of most of the geosystems men- 
tioned above [|33l - l35l . one cannot rule out other mechanisms for the emergence of power-law 
distributions |32l EH [23 ■ 

On the other hand, it is interesting to mention that, in addition to sizes and durations, power- 
law distributions have also been extensively reported in time between the occurrences of natural 
hazards (waiting times), as for instance in solar flares Il38ll39l , earthquakes [|40l - l42l . or solar wind 
P3ll : in other cases the distributions contain a power-law part mixed with other factors [|44l - l47]| . 
Nevertheless, the possible relation with critical phenomena is not direct [|48l |49ll . The distance 
between events, or jumps, has received relatively less attention [|50l - [52ll . 

The importance of power-law distributions in geoscience is apparent; however, some of the 
evidence gathered in favor of this paradigm can be considered as "anecdotic" or tentative, as it is 
based on rather poor data analysis. A common practice is to find some (naive or not) estimation 
of the probability density or mass function f{x) and plot ln/(x) versus Injc and look for a linear 
dependence between both variables. Obviously, a power-law distribution should behave in that 
way, but the opposite is not true: an apparent straight line in a log-log plot of f{x) should not be 
considered a guarantee of an underlying power-law distribution, or perhaps the exponent obtained 
from there is clearly biased [i28l[53] - l55l . But in order to discriminate between several competing 
theories or models, as well as in order to extrapolate the available statistics to the most extreme 
events, it is very important to properly fit power laws and to find the right power- law exponent (if 
any) [28]. 

The subject of this paper is a discussion on the most appropriate fitting, testing of the goodness- 
of-fit, and representation of power-law distributions, both non-truncated and truncated. A consis- 
tent and robust method will be checked on several examples in geoscience, including earthquakes, 
tropical cyclones, and forest fires. The procedure is in some points analogous to that of Clauset et 
al. [|54ll , although there are variations is some key steps, in order to correct several drawbacks of 
the original method [|2Tll56l . The most important difference is in the criterion to select the range 
over which the power law holds. As the case of most interest in geoscience is that of a continu- 
ous random variable, the more involving discrete case will be postponed to a separate publication 

mil El. 
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II. POWER-LAW FITS AND GOODNESS-OF-FIT TESTS 



A. Non-truncated and truncated power-law distributions 

Let us consider a continuous power-law distribution, defined in the range a < x < b, where b 
can be finite or infinite and a > 0. The probability density of x is given by, 



a-l /r" 



al-a_l/^a-l 

the limit b oo provides the non-truncated power-law distribution if a > 1 and a > 0, also called 
here pure power law; otherwise, for finite b one has the truncated power law, for which no restric- 
tion exists on a if a > 0, but a < 1 if a = (which is sometimes referred to as the power-function 
distribution [l2l); the case a = I needs a separate treatment, with 

^^"^^ " xln{b/a) ■ 

We will consider in principle that the distribution has a unique parameter, a, and that a and b are 

fixed and known values. Remember that, at point x, the probability density function of a random 

variable is defined as the probability per unit of the variable that the random variable lies in a 

infinitesimal interval around x, that is, 

^, , Prob[x < random variable < x + Axl 

f(x) = lim 

^ ' ^Y^o Ax 

and has to verify f{x) > and J°^^f{x)dx = 1, see for instance Ref. [39|. 

Equivalently, we could have characterized the distribution by its (complementary) cumulative 

distribution function, 

poo 

S{x) = Prob [random variable >x]= f{x)dx. 

Jx 

For a truncated or non-truncated power law this leads to 

iA"-i-i/z.«-i 

if a 7^ 1 and 



\n{b/a) ' 

if a = 1. Note that although f{x) always has a power-law shape, S{x) only has it in the non- 
truncated case (b o° and a > 1); nevertheless, even not being a power law in the truncated case, 
the distribution is a power law, as it is f{x) and not S{x) which gives the name to the distribution. 



B. Problematic fitting metliods 



Given a set of data, there are many methods to fit a probability distribution. Goldstein et al. 
Il55]| . Bauke [|53l, White et al. [|28]|, and Clauset et al. check several methods based in the 
fitting of the estimated probability densities or cumulative distributions in the power-law case. As 
mentioned in the first section, ln/(x) is then a linear function of ln.x;, both for non-truncated and 
truncated power laws. The same holds for \nS{x), but only in the non-truncated case. So, one 
can either estimate f{x) from data, using some binning procedure, or estimate S{x), for which no 
binning is necessary, and then fit a straight line by the least-squares method. As we find White 
et al.'s study the most complete, we summarize their results below, although those of the other 
authors are not very different. 

For non-truncated power-law distributions. White et al. find that the results of the least-squares 
method using the cumulative distribution are reasonable, although the points in S{x) are not in- 
dependent and linear regression should yield problems in this case. We stress that this procedure 
only can work for non-truncated distributions (i.e., with b oo), truncated ones yield bad results 
ttH. 

The least-squares method applied to the probability density f{x) has several variations, depend- 
ing on the way of estimating f{x). Using linear binning one obtains a simple histogram, for which 
the fitting results are catastrophic ll28l [53l [55l [60l . This is not unexpected, as linear binning of 
a heavy-tailed distribution can be considered as a very naive approach. If instead of linear bin- 
ning one uses logarithmic binning the results improve (when done "correctly"), and are reasonable 
in some cases, but they still show some bias, high variance, and depend on the size of the bins. 
A fundamental point should be to avoid having empty bins, as they are disregarded in logscale, 
introducing an important bias. 

In summary, methods of estimation of probability-distribution parameters based on least- 
squares fitting can have many problems, and usually the results are biased. Moreover, these 
methods do not take into account that the quantity to be fitted is a probability distribution (i.e., 
once the distributions are estimated, the method is the same as for any other kind of function). 
We are going to see that the method of maximum likelihood is precisely designed for dealing with 
probability distributions, presenting considerable advantages in front of the other methods just 
mentioned. 
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C. Maximum likelihood estimation 



Let us denote a sample of the random variable x with N elements as xi, jc2, . . . , xm, and let 
us consider a probability distribution f{x) parameterized by a. The likelihood function L{a) is 
defined as the joint probability density (or the joint probability mass function if the variable were 
discrete) evaluated atxi,X2, xn in the case in which the variables were independent, i.e., 

L(a) = n/(x,). 

Note that the sample is considered fixed, and it is the parameter a what is allowed to vary. In prac- 
tice it is more convenient to work with the log-likelihood, the natural logarithm of the likelihood 
(dividing by also, in our definition), 

£(a) = ilnL(a) = if; ln/(x,)- 

!=1 

The maximum likelihood (ML) estimator of the parameter a based on the sample is just the max- 
imum of i{a) as a function of a (which coincides with the maximum of L{a), obviously). For 
a given sample, we will denote the ML estimator as tte (e is from empirical), but it is important 
to realize that the ML estimator is indeed a statistic (a quantity calculated from a random sample) 
and therefore can be considered as a random variable; in this case it is denoted as a. In a formula, 

ttg = argmaxy^£(a), 

where argmax refers to the argument of the function i that make it maximum. 

For the truncated or the non-truncated continuous power-law distribution we have, substituting 
f{x) and introducing r = a/b, disregarding the case a = 0, 

CC — 1 9 

£(a) = In — r - aln — Ina, if a 7^ 1, 

1 — r" ^ a 

£{a) = -Inln- -In^, if a = 1; 

g is the geometric mean of the data, Ing = N^^ Y!i Injc,, and the last term in each expression is 
irrelevant for the maximization of i{a). Remember that the distribution is only parameterized by 
a, whereas a and b (and r) are constant parameters; therefore, i{a) is not a function of a and b, 
but of a. In order to find the maximum one can derive with respect a and set the result equal to 
zero mM, 



di{a) 



da 



1 r"'' ^ In r ,? 

= T + l ^-ln-=0, 

ae-1 l-r«^-i a 



which constitutes the so-called likelihood equation for this problem. For a non-truncated distribu- 
tion, r = 0, and it is clear that there is one and only one solution, 



which corresponds to a maximum, as 

L(a) = = Aiioc- ife-'^'^HsH^ 

has indeed a maximum (resembling a gamma probability density, see next subsection). Figure 1 
illustrates the log-likelihood function and its derivate, for simulated power-law data. 




FIG. 1: Log-likelihood l{a) and its derivative, for simulated non-truncated power-law data with exponent 
a = 1.15 and a = 0.001. 

In the truncated case it is not obvious that there is a solution to the likelihood equation; however, 
one can take advantage of the fact that the power-law distribution, for fixed a and b, can be viewed 
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as belonging to the regular exponential family, for which it is known that the maximum likelihood 
estimator exists and is unique. Indeed, in the single-parameter case, the exponential family can be 
written in the form, 

where both G{a) and T{x) can be vectors, the former containing the parameter a of the family. 
Then, for 6{a) = —a,T{x) = Inx, and H(x) = I we obtain the (truncated or not) power-law distri- 
bution, which therefore belongs to the regular exponential family, which guarantees the existence 
of a unique ML solution. 

In order to find the ML estimator of the exponent in the truncated case, we proceed, for practical 
reasons, by maximizing directly the log-likelihood i{a) (rather than by solving the likelihood 
equation). We will use the downhill simplex method [|6TI . although any other simpler procedure 
should work, as the problem is one-dimensional. In this case, one needs to take care when the 
value of a gets very close to one in the maximization algorithm, and then replace £(a) by its limit 
at a = 1, 

1 p 
i{a) -^a^i -Inln — ocln — Ina, 
r a 

which is in agreement with the likelihood function for a (truncated) power-law distribution with 
a = 1. 

An important property of ML estimators, not present in other fitting methods, is their invariance 
under re-parameterization. If instead of working with parameter a we use v = h{a), then, the ML 
estimator of v is "in agreement" with that of a, i.e., v = h{a). Indeed, 

d£ _ di da 
dv dadv' 

so, the maximum of £ as a function of v is attained at the point h{6c), provided that the function 
h is "one-to-one". Note that the parameters could be multidimensional as well. Reference Il62ll 
studies this invariance with much more care. 

In their comparative study. White et al. conclude that maximum likelihood estimation outper- 
forms the other methods, as always yields the lowest variance and bias of the estimator. This is 
not unexpected, as the ML estimator is, mathematically, the one with minimum variance among 
aU asymptotically unbiased estimators. This property is called asymptotical efficiency [|28ll53l . 
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D. Standard deviation of the ML estimator 



The interesting result of this subsection is the value of the uncertainty o of a, given by [fTTH . 

-1/2 



1 

(T = 



1 r"" ^In^r 



(a, -1)2 (i-r««-i)2_ 
but we will see below that this formula is not necessary, as o can be calculated from Monte Carlo 
simulations. The rest of the subsection is devoted to the particular derivation of o for a non- 
truncated power-law distribution, and therefore can be skipped by readers not liking probability 
theory. 

For the calculation of a (the ML estimator of a) one needs to realize that this is indeed a 
statistic (a quantity calculated from a random sample) and therefore it can be considered as a 
random variable. Note that a denotes the true value of the parameter, which is unknown. It is 
more convenient to work with a — 1 (the exponent of the cumulative distribution function); in the 
non-truncated case (r = with a > 1) we can easily derive its distribution. First let us consider 
the geometric mean of the sample, g, rescaled by the minimum value a, 

a N a 

As each jc, is power-law distributed (by hypothesis), a simple change of variables shows that 
\n{xi/a) turns out to be exponentially distributed, with scale parameter l/(a — 1); then, the sum 
will be gamma distributed with the same scale parameter and with shape parameter given by 
(this is the key property of the gamma distribution [|63l '). Therefore, ln(^/a) will follow the same 
gamma distribution but with scale parameter N^^{a — 1)^^ 

At this point it is useful to introduce the generalized gamma distribution [[IIIIIIMIj with density, 
for a random variable y >0, 

D(y)= (ly-'e-jy/cf 

where c > is the scale parameter and 7 and d are the shape parameters, which have to verify 
< 7/5 < 00 (so, the only restriction is that they have the same sign, although the previous ref- 
erences only consider 7 > and 5 > 0); the case 5 = 1 yields the usual gamma distribution and 
5 = 7= 1 is the exponential one. Again, changing variables one can show that the inverse z = l/y 
of a generalized gamma variable is also a generalized gamma variable, but with transformed pa- 
rameters, 

7, 5, -7,-5,-. 

c 
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So, a — 1 = z = \/\n[g/a) will have a generalized gamma distribution, with parameters — A'^, 
— 1, and N{a — l) (keeping the same order as above). Introducing the moments of this distribution 

^ r(r/5) 

(valid for m > —y if 7 > and for m < |7| if 7 < 0, and (y'") infinite otherwise), we obtain the 
expected value of a — 1, 



A^-1 

Note that the ML estimator, alpha, is biased, as its expected value does not coincide with the right 
value, a; however, asymptotically, the right value is recovered. An unbiased estimator of a can 
be obtained for a small sample as {I — l/N)ae + l/N, although this will not be of interest for us. 
In the same way, the standard deviation of a — 1 (and of a) turns out to be 

((a-l)2)-(a-l)2 



{\-\/N)^/N^' 

which leads asymptotically to (a — 1)/^/N. In practice, we need to replace a by the estimated 
value tte', then, this is nothing else than the limit r = (Z? — t- 0°) of the general formula stated above 
for a [fTTll . The fact that the standard deviation tends to zero asymptotically (together with the 
fact that the estimator is asymptotically unbiased) implies that any single estimation converges (in 
probability) to the true value, and therefore the estimator is said to be consistent. 



E. Goodness-of-fit test 



One can realize that the maximum likelihood method always yields a ML estimator for a, no 
matter which data one is using. In the case of power laws, as the data only enters in the likelihood 
function through its geometric mean, any sample with a given geometric mean yields the same 
value for the estimation, although the sample can come from a true power law or from any other 
distribution. So, no quality of the fit is guaranteed and thus, maximum likelihood estimation 
should be rather called minimum unlikelihood estimation. For this reason a goodness-of-fit test is 
necessary (although recent works do not take into account this fact lfT3l[28l[65l ). 

Following Goldstein et al. J55l and Clauset et al. [54] we use the Kolmogorov-Smirnov (KS) 
test, based on the calculation of the KS statistic or KS distance de between the theoretical probabil- 
ity distribution, represented by S{x), and the empirical one, Se{x). The latter (which is an unbiased 
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estimator of the cumulative distribution) is given by the stepwise function 

Se{x) = ne{x)/N, 

where ne{x) is the number of data in the sample taking a value of the variable larger than or equal 
to X. The KS statistic is just the maximum difference, in absolute value, between S{x) and ne{x) /N, 
that is, 

de = max \S{x) — Se{x) \ = max 

a<x<b a<x<b 

where the bars denote absolute value. Note that the theoretical cumulative distribution S{x) is 
parameterized by the value of a obtained from ML, oCg. In practice, the difference only needs to 
be evaluated at the points Xi of the sample (as the routine ksone of Ref. [1611 does). A more strict 
mathematical definition uses the supremum instead of the maximum, but in practice the maximum 
works perfectly. We illustrate the procedure in Fig. 2, with a simulation of a non-truncated power 
law. 

Intuitively, if dg is large the fit is bad, whereas if dg is small the fit can be considered as good. 
But the relative scale of dg is provided by its own probability distribution, through the calculation 
of a p— value. Under the hypothesis that the data follow indeed the theoretical distribution, with 
the parameter a obtained from our previous estimation (this is the null hypothesis), the p— value 
provides the probability that the KS statistic takes a value larger than the one obtained empirically, 
i.e., 

p = Prob[KS statistic for power-law data (with a^) is > dg]', 

then, bad fits will have rather small />— values. 

It turns out that, in principle, the distribution of the KS statistic is known, at least asymptotically, 
independently of the underlying form of the distribution, so, 

oo 

PQ = Q{dgVN+0A2dg+0Aldg/VN)=2Y,{-'^y^^&M-^fidgVN+0.l2dg+0.ndg/VN)^], 

j=i 

for which we use the routine in Ref. [61 1 (but note that Eq. (14.3.9) there is not right). 

Nevertheless, this formula will not be accurate in our case, as we are "optimizing" the value of 
a using the same sample to which we apply the KS test, which yields a bias in the test, i.e., the 
formula would work for the true value of a, but not for one obtained by ML, which would yield 
in general a smaller KS statistic and too large p— values (because the fit for ttg is better than for 



^ftg— 1 



X. 



ng{x) 
N 
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X 

FIG. 2: Empirical (complementary) cumulative distribution and its corresponding fit for a simulated non- 
truncated power-law distribution with a = 1.15 and a = 0.001. The point x at which the maximum differ- 
ence between both curves takes place is marked as an illustr ation of the calculation of the KS statistic. 

the true value a) [|54ll55l . However, for this very same reason the formula can be useful to reject 
the goodness of a given fit, i.e., if p obtained in this way is already below 0.05, the true p will be 
even smaller and the fit is certainly bad. But the opposite is not true. In a formula, 

ifQ{deVN + 0.l2de + 0.nde/VN) < 0.05, reject power law, 

otherwise, no decision can be taken yet. As a final comment, perhaps a more powerful test would 
be to use, instead of the KS statistic, the Kuiper's statistic [[6T|, which is a refinement of the former 
one. We have not compared both choices, but it is stated by Clauset et al. that they yield very 
similar results. 

F. The Clauset et al.'s recipe 

Now we are in condition to explain the genuine Clauset et al.'s method [54]. As we are not 
going to apply it, readers not interested or not curious can skip the subsection. The key to fitting 
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a power law is neither the ML estimation of the exponent nor the goodness-of-fit test, but the 
selection of the interval [a,b\ over which the power law holds. Initially, we have taken a and b 
as fixed parameters, but in practice this is not the case, and one has to decide where the power 
law starts and where ends, independently of the total range of the data. In any case, will be the 
number of data in the power-law range (and not the total number of data). 

The recipe of Clauset et al. applies to non-truncated power-law distributions {b °°), and 
considers that a is a variable which needs to be fit from the sample (values of x below a are outside 
the power-law range). The recipe simply consists in the search of the value of a which yields a 
minimum of the KS statistic, using as a parameter of the theoretical distribution the one obtained 
by maximum likelihood, Ug, for the corresponding a (no calculation of a />— value is required for 
each fixed a). In other words, 

a — the one that yields minimum de 

Then, a global p— value is computed by generating synthetic samples by a mixture of parametric 
and non-parametric bootstrap, to which the same procedure is applied in order to fit a and a (i.e., 
minimization of the KS distance using ML for fitting). 

These authors do not provide any explanation of why this should work, although one can argue 
that, if the data is indeed a power law with the desired exponent, the larger the number of data (the 
smaller the a— value), the smaller the value of de, as de goes as 1/ \/N (for large A^, see previous 
subsection). On the other hand, if for a smaller a the data departs from the power law, this deviation 
should compensate and overcome the reduction in de due to the increase of A^^, yielding a larger de. 
But there is no reason to justify this overcoming. 

Nevertheless, we will not use the Clauset's et al.'s procedure for two other reasons. First, its 
extension to truncated power laws, although obvious, and justifiable with the same arguments, 
yields bad results, as the resulting values of the upper truncation cutoff, b, are highly unstable. 
Second, even for non-truncated distributions, it has been shown that the method fails to detect 
the existence of a power law for data simulated with a power- law tail [jSH: the method yields 
an (3— value well below the true power-law region, and therefore the resulting p is too small for 
the power law to become acceptable. We will explain an alternative method that avoids these 
problems, but first let us come back to the case with a and b fixed. 
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G. Monte Carlo simulations 

Remember that we are considering a power law distribution, defined in a < x < b. We already 
have fit the distribution, by ML, and we are testing the goodness of the fit by means of the KS 
statistic. In order to obtain a reliable p— value for this test we will perform Monte Carlo simulations 
of the whole process. A synthetic sample power-law distributed and with elements can be 
obtained in a straightforward way, from the inversion or transformation method [|59ll6Tl[66ll . 



where m,- represents a uniform random number in [0, 1). Use your favorite random number gener- 
ator for it [ED. 

H. Application of the complete procedure to the syntetic sample and calculation of value 

The previous fitting and testing procedure is applied in exactly the same way to the synthetic 
sample, yielding a ML exponent (from synthetic, or simulated) and a KS statistic ds, computed 
as the difference between the theoretical cumulative distribution, with parameter a^, and the sim- 
ulated one, ns{x) /N (obtained from simulations with Ug, as described in the previous paragraph), 
i.e.. 



Both values of the exponent, oCj, and a^, should be close, but not necessarily the same. Note that 
we are not parameterizing S{x) by the empirical value ttg, but with a new fitted value a^. This is in 
order to avoid biases, as a parameterization with would lead to worse fits (as the best one would 
be with tts) and therefore to larger values of the resulting KS statistic and to artificially smaller 
p— values. So, although the null hypothesis of the test is that the exponent of the power law is ttg, 
and synthetic samples are obtained with this value, no further knowledge of this value is used in 
the test. This is the procedure used in Refs. [|54l[67ll . but it is not clear if it is the one of Ref. [|55l . 

In fact, one single synthetic sample is not enough to do a proper comparison with the empirical 
sample, and we repeat the simulation many times (avoiding of course to use the same seed of the 
random number generator for each sample). The most important outcome is the set of values of 
the KS statistic, ds, which allows to estimate its distribution. The p— value is simply calculated as 



a 



Xi = 



[l_(l_^a.-l)j,^.]l/(a«-l)' 




number of simulations with ds > de 



P = 



Ns 
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where Ns is the number of simulations. Figure 3 shows an example of the distribution of the KS 
statistic for simulated data, which can be used as a table of critical values when the number of data 
and the exponent are the same as in the example [,55 J . 



0.8 
0.6 
0.4 

0.2 

P 




0.01 



S(ds) 
p-value 




0.02 0.03 dg 0.04 



0.05 



FIG. 3: Cumulative (complementary) distribution of the Kolmogorov-Smirnov statistic for simulated non- 
truncated power-law distributions with a = 1.15. 

The standard deviation of the p— value can be calculated just using that the number of simula- 
tions with ds > de is binomially distributed, with standard deviation \/Nsp{l — p) and therefore 
the standard deviation of p is the same divided by A^,, 



^P (I-P) 

In fact, the p— value in this formula should be the ideal one (the one of the whole population) but 
we need to replace it by the estimated value; further, when doing estimation from data, Ns should 
be Ns—l, but we have disregarded this bias correction. It will be also useful to consider the relative 
uncertainty of p, which is the same as the relative uncertainty of the number of simulations with 
ds > de (as both are proportional). Dividing the standard deviation of p by its mean (which is p). 
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we obtain 




\-p 



pNs 



number of simulations with ds > dg 



l-p 



(we will recover this formula for the error of the estimation of the probability density). 

In this way, small p— values are associated to large values of dg, and therefore to bad fits. 
However, note that if we put the threshold of rejection in, let us say, p < 0.05, even true power-law 
distributed data, with exponent tte, yield "bad fits" in one out of 20 samples (on average). So we 
are rejecting true power laws in 5 % of the cases (type I error). On the other hand, lowering the 
threshold of rejection would reduce this problem, but would increase the probability of accepting 
false power laws (type II error). In this type of tests a compromise between both types of errors 
is always necessary, and depends on the relative costs of rejecting a true hypothesis or accepting a 
false one. 

In addition, we can obtain from the Monte Carlo simulations the uncertainty of the ML estima- 
tor, just computing oCs, the average value of a^, and from here its standard deviation. 



o= y(a,-a,)2. 

This procedure yields good agreement with the analytical formula of Ref. [fTTI . but can be much 
more useful in the discrete power-law case. 

I. Alternative method to the one by Clauset et al. 

At this point, for given values of the truncation points, a and b, we are able to obtain the 
corresponding ML estimation of the power-law exponent as well as the goodness of the fit, by 
means of the p— value. Now we face the same problem Clauset et al. tried to solve: how to select 
the values of a and bl We adopt the simple method proposed by Peters et al. [[2r|: sweeping 
many different values of a and b we should find, if the null hypothesis is true (i.e., if the sample is 
power-law distributed), that many sets of intervals yield acceptable fits (high enough p— values), 
so we need to find the "best" of such intervals. And which one is the best? For a non-truncated 
power law the answer is easy, we select the largest interval, i.e., the one with the smaller a. All the 
other acceptable intervals will be inside this one. 

But if the power law is truncated the situation is not so clear, as there can be several non- 
overlapping intervals. In fact, many true truncated power laws can be contained in the data, at 
least there are well know examples of stochastic processes with double power-law distributions 
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PTll68l - r70l . At this point any selection can be reasonable, but if one insists in having an automatic, 
blind procedure, a possibility is to select either the interval which contains the larger number of 
data, A'^ [|2T]| , or the one which has the larger log-range, b/a. For double power-law distributions, in 
which the exponent for small x is smaller than the one for large x, the former recipe has a tendency 
to select the first (small x) power-law regime, whereas the second procedure usually leads to the 
other (large x) power law. 

In summary, the final step of the method for truncated power-law distributions is contained in 
the formula 

[a,b\ = the one that yields higher or )• provided that p > pc, 

b/a 

which contains in fact two procedures, one maximizing N and the other maximizing b/a. We 
will test both in this paper. For non-truncated power-law distributions the two procedures are 
equivalent. 

One might be tempted to choose Pc = 0.05, however, it is safer to consider a larger value, as for 
instance Pc = 0.20. Note that the p— value we are using is the one for fixed a and b, and then the 
p— value of the whole procedure should be different, but at this point it is not necessary to obtain 
such a p— value, as we should have already come out with a reasonable fit. Figure 4 shows the 
results of the method for true power-law data. 



J. Truncated or non-truncated power-law distribution? 

The most natural, simplest, and interesting choice is to try to fit first a non-truncated power law 
distribution. If an acceptable fit is found, it is expected that a truncated power law, with b > x„uix, 
the largest value of x, would yield also a good fit. In fact, if b is not considered as a fixed value 
but as a parameter to fit, its maximum likelihood estimator when the number of data is fixed, i.e., 
when b is in the range b > Xmax, is be = Xmax- This is easy to see [|11|. just looking at the equation 
for i{a), which increases as b approaches Xmax- (In the same way, the ML estimator of a, for fixed 
number of data, would be = Xmi„, but we are not interested in such a case.) On the other hand, it 
is reasonable that a truncated power law yields a better fit than a non-truncated one, as the former 
has two parameters and the latter only one (assuming that a is fixed, in any case). 
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a 

FIG. 4: Evolution as a function of a of the KS statistic, the false value pQ, the true p— value, and the 
estimated exponent. The true exponent, here called and equal to 1.15, together with a 2a interval, are 
displayed as thin black lines. 

In order to do a proper comparison, in such situations the so-called Akaike information cri- 
terion (AIC) can be used. This is defined simply as the difference between twice the number of 
parameters and twice the maximum of the log-likelihood, i.e., 

AIC = 2 X (number of parameters) — 2(.{ae). 

In general, more parameters lead to better fits, and to higher likelihood, so, the first term compen- 
sates this factor. Therefore, given two models, the one with smaller AIC is preferred. However, 
for the data analyzed in this paper, the power-law regimes are usually not comparable, as they are 
based on different data. So, the AIC will be of little use, except if the data are exactly the same. 
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III. ESTIMATION OF PROBABILITY DENSITIES AND CUMULATIVE DISTRIBUTION 
FUNCTIONS 

The method of maximum likehhood does not rely on the estimation of the probability distri- 
butions, in contrast to other methods. Nevertheless, in order to present the results, it is useful to 
display some representation of the distribution, together with its fit. This procedure has no statis- 
tical value (it cannot provide a substitution of a goodness-of-fit test) but is very helpful as a visual 
guide, specially in order to detect bugs in the algorithms. 



A. Estimation of the probability density 

Remember that the definition of the probability density is 

, . Prob[x < random variable < x + Ax] 
fix) = lim 

but in practice the width Ax cannot tend to zero (there would be no statistics in such case), and 
one has to take a non-zero value of the width. The most usual procedure is to draw a histogram 
using linear binning (bins of constant width); however, there is no reason why the width of the 
distribution should be fixed (even, some authors seem to believe that the most correct procedure is 
to take Ax = 1). In fact. Ax should be chosen in order to balance the necessity of having enough 
statistics (large Ax) with that of having a good sampling of the function (small Ax). For power- 
law distributions and other fat-tailed distributions, which take values across many different scales, 
the right choice depends of the scale of x. In this cases it is very convenient to use the so-called 
logarithmic binning [[BTHTTII . This uses bins that appear as constant in logarithmic scale, but that in 
fact grow exponentially (for which the method is sometimes called exponential binning instead). 
Curiously, this useful method is not considered by classic texts on density estimation [72J. 

Let us consider the semi-open intervals [ac^o)? ['^ij^i), • • • , • • •, also called bins, with 

(^k^\ = bk and = Ba^ (this constant B has nothing to do with the one in the Gutenberg-Richter 
law. Sec. 1). For instance, if 5 = v^TO this yields 5 intervals for each order of magnitude. Notice 
that the width of every bin grows linearly with ak, but exponentially with k, as bk — ak = {B — 
\)ak = ao{B — l)B^. The value of B should be chosen in order to avoid low populated bins, 
otherwise, a spurious exponent equal to one appears f3\\. 

We simply will count the number of occurrences of the variable in each bin. For each value of 
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the random variable xu the corresponding bin is found as 

ln(x;/ao) 
InB • 

Of course, ciq has to be smaller than any possible value of x. For a continuous variable the concrete 
value of ao should be irrelevant (if it is small enough), but in practice one has to avoid that the 
values of coincide with round values of the variable [|56l . 

So, with this logarithmic binning, the probability density can be estimated (following its defi- 
nition) as the relative frequency of occurrences in a given bin divided by its width, i.e., 

, number of occurrences in bin 

^ ^ {bk — ak) X number of occurrences ' 

where the estimation of the density is associated to a value of x represented by x\. The most 
practical solution is to take it in the middle of the interval in logscale, so = ^/a^^- However, 
for sparse data covering many orders of magnitude it is necessary to be more careful. In fact, what 
we are looking for is the point x*^. whose value of the density coincides with the probability of 
being between and divided by the with of the interval. This is the solution of 



bk - cik J at bk - ak 

where / and S are the theoretical distributions. When the distribution and its parameters are known, 
the equation can be solved either analytically or numerically. It is easy to see that for a power-law 
distribution (truncated or not) the solution can be written 

l/a 



4 



(a-l 



5a/2-l(5_i; 
' 5«-l-l 



where we have used that B = b^/a^ (if we were not using logarithmic binning we would have 
to write a bin-dependent B^). Note that for constant (bin-independent) B, i.e., for logarithmic 
binning, the solution is proportional but not equal to the geometric mean of the extremes of the 
bin. Nevertheless, the omission of the proportionality factor does not alter the power-law behavior, 
just shifts (in logarithmic scale) the curve. But for a different binning procedure this is no longer 
true. Moreover, for usual values of B the factor is very close to one [1771 . although large values of 
B [|56ll yield noticeable deviations if the factor in brackets is not included, see also our treatment 
of the radionuclide half-lives in Sec. Ill, with 5=1. Once the value of B is fixed (usually in this 
paper to v^), in order to avoid empty bins we merge consecutive bins until the resulting merged 
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bins are not empty. This leads to a change in the effective value of B for merged bins, but the 
method is still perfectly vaUd. 

The uncertainty of fe{x) can be obtained from its standard deviation (the standard deviation 
of the estimation of the density, /g, not of the original random variable x). Indeed, assuming 
independence in the sample (which is already implicit in order to apply maximum likelihood 
estimation), the number of occurrences of the variable in bin is a binomial random variable (in 
the same way as for the p— value). As the number of occurrences is proportional to fe{x), the ratio 
between the standard deviation and the mean for the number of occurrences will be the same as 
for fe{x), which is, 

_ / ^ - ^ , 

fe (x) V mean number of occurrences in k Voccurrences in k ' 
where we replace the mean number of occurrences in bin k (not available from a finite sample) by 
the actual value, and q, the probability that the occurrences are not in bin k, by one. This estimation 
of oy(^) fails when the number of counts in the bin is too low, in particular if it is one. 

One final consideration is that the fitted distributions are normalized between a and b, with N 
number of data, whereas the empirical distributions include all data, with Nfot of them, Ntot > N. 
Therefore, in order to compare the fits with the empirical distributions, we will plot Nf{x)/Ntot 
together with /e(x^). 



B. Estimation of the cumulative distribution 



The estimation of the (complementary) cumulative distribution is much simpler, as bins are not 
involved. One just needs to sort the data, in ascending order, X(^i^ < JC(2) < • • • < ^{Nm-i) — ^iN,ot)'' 
then, the estimated cumulative distribution is 

. ne{x[i)) Ntot-i+l 

for the data points, constant below these data points, and Se{x) = for x > -^(ATf^,); is the 

number of data with x > JC(,-) in the empirical sample. We use the case of empirical data as an 
example, but it is of course the same for simulated data. For the comparison of the empirical 
distribution with the theoretical fit we need to correct the different number of data in both cases. 
So, we plot both [A^5'(x) +ne{b)]/Ntot and Se{x), in order to check the accuracy of the fit. 
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IV. DATA ANALYZED AND RESULTS 



We have explained how, in order to try to certify that a set of data is compatible with a simple 
power-law distribution, a large number of mathematical formulas and immense calculations are 
required. Now we check the performance of our method with diverse geophysical data, which 
were previously analyzed with different, less rigorous or worse-functioning methods. For the 
peculiarities and challenges of the data set, we also include the half-lives of unstable nuclides. 
The parameters of the method are fixed to A^^ = 1000 Monte Carlo simulations and the values of 
a and b are found sweeping a fixed number of points per order of magnitude, equally spaced in 
logarithmic scale. This number is 10 for non-truncated power laws (in which b is fixed to infinity) 
and 5 for truncated power laws. Three values of pc are considered: 0.1, 0.2, and 0.5, in order to 
compare the dependence of the results on this parameter. 

A. Half-lives of the elements 

Reference ll56l studied the statistics of the half-lives of radioactive elements. The basic entity 
of study was the nuclide, which is defined as an isotope of a chemical element in a concrete 
energy state (either the fundamental one or an excited one). Nuclides are therefore specified by 
the number of protons, the number of neutrons, and the energy state. Roughly speaking, the term 
nuclide can refer both to the atomic species or to its nucleus, but the distinction will be irrelevant 
to us. Unstable nuclides are called radionuclides. 

Any radionuclide has a constant probability of disintegration per unit time, the decay constant, 
let us call it A [TTBI . If M is the total amount of radioactive material at time t, this means that 

1 dM 

= A . 

M dt 

This leads to an exponential decay, for which a half-life or a lifetime x can be defined, as 

, o ln2 
fi/2 = Tln2 = — . 

It is well known that the half-lives take disparate values, for example, that of ^^^U is 4.47 (Ameri- 
can) billions of years, whereas for other nuclides it is a very tiny fraction of a second. 

It has been recently claimed that these half-lives are power-law distributed [56] . In fact, 3 
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power-law regions were identified in the probability density of roughly, 



0.65 



for 10-^5 </'i/2 < 0.1* 
for 100^ < txi2 < lO^^s 
for?i/2 > 10^5. 



1/2 



/(^l/2)°^ < lA 



1.19 



1/2 



lA 



1.09 



V 



1/2 



Notice that there is some overlap between two of the intervals, as reported in the original reference, 
due to problems in delimiting the transition region. The study used variations of the Clauset et 
al.'s method of minimization of the KS statistic [|54ll , introducing and upper cutoff and additional 
conditions to escape from the global minimum of the KS statistic, which yielded the rejection 
(p = 0.000) of the power-law hypothesis. These additional conditions were of the type of taking 
either a or b/a greater than a fixed amount. 

For comparison, we will apply instead the method explained in the previous section to this 
problem. Obviously, our random variable will he x = ti/2- The data is exactly the same as in the 
original reference, coming from the Lund/LBNL Nuclear Data Search web page [17411 . Elements 
whose half-life is only either bounded from below or from above are discarded for the study, which 
leads to 3002 radionuclides with well-defined half-lives; 2279 of them are in their ground state and 
the remaining 723 in an exited state. The minimum and maximum half-lives in the data set are 
3 X 10^^^ s and 7 x 10^^ s, respectively, yielding more than 53 orders of magnitude of logarithmic 
range. Further details are in Ref. (56]. 

The results of our fitting and testing method are shown in Table I and in Fig. 5. The fitting of a 
non-truncated power law yields results in agreement with Ref. [|56l . with a = 1.09±0.01 and a = 
3 X 10^ s, for the 3 values of pc analyzed (0.1, 0.2, and 0.5). When fitting a truncated power law, 
the maximization of the log-range, b/a, yields essentially the same results as for a non-truncated 
power law, with slightly smaller exponents a due to the finiteness of b (results not shown). In 
contrast, the maximization of the number of data A^^ yields and exponent a ~ 0.95 between a ~ 0. 1 
s and b ~ 400 s (with some variations depending on pc). This result is in disagreement with Ref. 
Il56ll . which yielded a smaller exponent for smaller values of a and b. In fact, as the intervals do 
not overlap both results are compatible, but it is also likely that a different function would lead to a 
better fit; for instance, a lognormal between 0.01 s a 10^ s was proposed in Ref. [|56l . although the 
fitting procedure there was not totally reliable. Finally, the intermediate power-law range reported 
in the original paper (the one with a = 1.19) is not found by any of our algorithms working on 
the entire dataset. It is necessary to cut the data set, removing data below, for instance, 100 s 
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TABLE I: Results of the fits for the Ntot = 3002 nuchde half-Uves data, for different values of pc- We show 
the cases of a pure or non-truncated power law (with b = 00, fixed) and truncated power law (with b finite, 
estimated from data), maximizing N. 

N a (s) (s) b/a a±a pc 

143 0.31 6E+08 00 00 1.089± 0.007 0.10 
143 0.31 6E+08 00 00 1.089± 0.007 0.20 
143 0.31 6E+08 00 00 1.089± 0.008 0.50 
1596 0.0794 501 6310 0.952± 0.010 0.10 
1539 0.1259 501 3981 0.959± 0.011 0.20 
1458 0.1259 316 2512 0.950± 0.011 0.50 

(which is equivalent to impose a > 100 s), in order that the algorithm converges to that solution. 
So, caution must be taken when applying the algorithm blindly, as important power-law regimes 
may be hidden by others having either larger A'^ or larger log-range. 

B. Seismic moment of eartliquakes 

The statistics of the sizes of earthquakes [|75l has been investigated not only since the intro- 
duction of the first magnitude scale, by C. Richter, but even before, in the 1930's, by Wadati [fTSl . 
From a modern perspective, the most reliable measure of earthquake size is given by the (scalar) 
seismic moment M, which is the product of mean final slip, rupture area, and the rigidity of the 
fault material [|76l . It is usually assumed that the energy radiated by an earthquake is propor- 
tional to the seismic moment [|14||. so, a power-law distribution of the seismic moment implies a 
power-law distribution of energies, with the same exponent. 

The most relevant results for the distribution of seismic moment are those of Kagan for world- 
wide seismicity [[T3l, who showed that its probability density has a power-law body, with a uni- 
versal exponent in agreement with a = 1.63 5/3, but with an extra, non-universal exponential 
decay (at least in terms of the complementary cumulative distribution). However, Kagan anal- 
ysis, ending in 2000, refers to a period of global seismic "quiescence"; in particular, the large 
Sumatra- Andaman earthquake of 2004 and the subsequent global increase of seismic activity are 
not included. Much recently. Main et al. have shown, using a Bayesian information criterion, that 
the inclusion of the new events leads to the preference of the non-truncated power-law distribution 
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FIG. 5: Estimated probability density of the half-lives of the radionuclides, together with the power-law 
fits explained in the text. The number of log-bins per order of magnitude is one, which poses a challenge in 
the correct estimation of the exponent, as explained in Sec. II. Data below 10^^° s are not shown. 

in front of models with a faster large-M decay. 

We take the Centroid Moment Tensor (CMT) worldwide catalog analyzed by Kagan and by 
Main et al., including now data from January 1977 to December 2010, and apply our statistical 
method to it. Although the statistical analysis of Kagan is rather complete, his procedure is differ- 
ent to ours. Note also that the data set does not comprise the recent (20 11) Tohoku earthquake in 
Japan, nevertheless, the qualitative change in the data with respect to Kagan's period of analysis is 
very remarkable. Following this author, we separate the events by their depth: shallow for depth 
< 70 km, intermediate for 70 km < depth < 300 km, and deep for depth > 300 km. The number 
of earthquakes in each category is 26824, 5281, and 1659, respectively. 

Second, we also consider the Southern California's Waveform Relocated Earthquake Cat- 
alog, from January 1st, 1981 to June 30th, 2011, covering a rectangular box of coordinates 
(122°W,30°N), (113°W,37.5°N) ^UM- This catalog contains 111981 events with m > 2. As, in 
contrast with the CMT catalog, this one does not report the seismic moment M, the magnitudes m 
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there are converted into seismic moments, using the formula 

3. 

logioM= -(m + 6.07), 

where M comes in units of Nm (Newtons times meters); however, this formula is a very rough 
estimation of seismic moment, as it is only accurate (and exact) when m is the so-called moment 
magnitude [|T4ll . whereas the magnitudes recorded in the catalog are of a different type. In any 
case, our procedure here is equivalent to fit an exponential distribution to the magnitudes reported 
in the catalog. 

Tables II and III and Fig. 6 summarize the results of analyzing these data with our method, 
taking x = M as the random variable. Starting with the non-truncated power-law distribution, we 
always obtain an acceptable (in the sense of non-rejectable) power-law fit, valid for several orders 
of magnitude. In all cases the exponent a is between 1.61 and 1.71, but for Southern California 
it is always very close to 1.66. For the worldwide CMT data the largest value of a is 3 x 10^^ 
Nm, corresponding to a magnitude m = 6.25 (for shallow depth), and the smallest is a = 8 x 10^^ 
Nm, corresponding to m = 5.2 (intermediate depth). If the events are not separated in terms of 
their depth (not shown), the results are dominated by the shallow case, except for pc = 0.5, which 
leads to very large values of a and a (a = 5 x 10^^ Nm and a ~ 2). The reason is probably 
the mixture of the different populations, in terms of depth, which is not recommended by Kagan. 
This is an indication that the inclusion of an upper limit b to the power law may be appropriate, 
with each depth corresponding to different b's. For Southern California, the largest a found (for 
Pc = 0.5) is 1.6 X 10^^ Nm, giving m = 4. This value is somewhat higher, in comparison with 
the completeness magnitude of the catalog; perhaps the reason that the power-law fit is rejected 
for smaller magnitudes is due to the fact that these magnitudes are not true moment magnitudes, 
but come from a mixture of different magnitude definitions. If the value of a is increased, the 
number of data A^^ is decreased and the power-law hypothesis is more difficult to reject, due simply 
to poorer statistics. When a truncated power law is fitted, using the method of maximizing the 
number of data leads to similar values of the exponents, although the range of the fit is in some 
cases moved to smaller values (smaller a, and b smaller than the maximum M on the dataset). The 
method of maximizing b/a leads to results that are very close to the non-truncated power law. 
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TABLE II: Results of the non-truncated power-law fit {b = oo) applied to the seismic moment of earthquakes 
in CMT worldwide catalog (separating by depth) and to the Southern California catalog, for different pc- 

Catalog N a a±a pc 

CMT deep 1216 0.1259E-F18 1.622± 0.019 0.10 

intermediate 3701 0.7943E-F17 1.654± 0.011 0.10 

shallow 5799 0.5012E-F18 1.681± 0.009 0.10 

CMT deep 898 0.1995E-F18 1.608± 0.020 0.20 

intermediate 3701 0.7943E-F17 1.654± 0.011 0.20 

shallow 5799 0.5012E-F18 1.681± 0.009 0.20 

CMT deep 898 0.1995E-F18 1.608± 0.021 0.50 

intermediate 3701 0.7943E-F17 1.654± 0.011 0.50 

shallow 1689 0.3162E-F19 1.706± 0.018 0.50 

S. California 1327 0.1000E-F16 1.660± 0.018 10 

S. California 1327 0.1000E-F16 1.660± 0.018 20 

S. California 972 0.1585E-H16 1.654± 0.021 50 

C. Energy of tropical cyclones 

Tropical cyclones are devastating atmospheric-oceanic phenomena comprising tropical depres- 
sions, tropical storms, and hurricanes or typhoons [|79l . Although the counts of events every year 
have been monitored for a long time, and other measurements to evaluate annual activity have 
been introduced (see Ref. llSOl for an overview), little attention has been paid to the statistics of 
individual tropical cyclones. 

In 2005, Emanuel introduced the power-dissipation index (PDI) as a simple way to obtain a 
rough estimation of the total energy dissipated by all tropical cyclones in a given season and some 
ocean basin [fSTIl . But the PDI can also be used to characterize individual events as well, as it was 
done later in Ref. [|23l . Indeed, the PDI is defined as the sum for all the discrete times t (that 
comprise the lifetime of a tropical cyclone) of the cube of the maximum sustained wind speed 
multiplied by the time interval of sampling. At. In a formula, 

PDI = Y,v^At, 

Vf 

where Vt is the maximum sustained wind speed. In the so-called best-track records. At = 6 hours; 
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TABLE III: Same as the previous table, but fitting a truncated power-law distribution, by maximizing the 
number of data. 

Catalog N a b b/a a it a pc 

CMTdeep 1216 0.1259E+18 0.3162E+23 0.2512E+06 1.621± 0.019 0.10 
intermediate 3701 0.7943E+17 0.7943E+22 O.lOOOE+06 1.655± 0.011 0.10 
shallow 13740 0.1259E+18 0.5012E+20 0.3981E+03 1.642± 0.007 0.10 
CMT deep 1076 0.1259E+18 0.5012E+19 0.3981E+02 1.674± 0.033 0.20 
intermediate 3701 0.7943E+17 0.7943E+22 O.lOOOE+06 1.655± 0.011 0.20 
shallow 13518 0.1259E+18 0.1995E+20 0.1585E+03 1.636± 0.008 0.20 
CMTdeep 898 0.1995E+18 0.3162E+23 0.1585E+06 1.604± 0.021 0.50 
intermediate 3701 0.7943E+17 0.7943E+22 O.lOOOE+06 1.655± 0.011 0.50 
shallow 11727 0.1259E+18 0.1995E+19 0.1585E+02 1.608± 0.012 0.50 
S. California 1146 0.1259E+16 0.1259E+22 O.lOOOE+07 1.663± 0.020 10 
S. California 1146 0.1259E+16 0.1259E+22 O.lOOOE+07 1.663± 0.020 20 
S. California 344 0.7943E+16 0.1259E+22 0.1585E+06 1.664± 0.036 50 

this factor would only be necessary in order to compare with other data with different A? (but 
caution should be present in this case for the possible fractal nature of the speed signal). Although 
the speeds are reported in knots, they are converted to m/s (using that 1 knot=0.514 m/s), and then 
we report the PDI in m-^/s^. 

Reference [|23l studied the statistics of the PDI (defined for individual events, in contrast to 
Emanuel's work) in 4 different ocean basins for several time periods. The results showed a rapid, 
perhaps exponential, decay at the tail, but a body of the distribution compatible with a power 
law, for 1 or 2 orders of magnitude, with exponents close to one. The connection with SOC 
phenomena was discussed in Ref. [|33ll . The method used was again a variation of the Clauset's 
one, introducing an upper cutoff and additional restrictions to the variations of the parameters. 
Here we revisit this problem, trying to use updated data (whenever it has been possible), and 
applying the method which is the subject of this paper to x = PDI. 

The data has been downloaded from the National Hurricane Center (NHC) of NOAA, for the 
North Atlantic and the Northeastern Pacific [|82l [831 and from the Joint Typhoon Warning Center 
(JTWC) of the US Navy jHH [85l for the Northwestern Pacific, the Southern Hemisphere (com- 



30 




M [Nm] 

FIG. 6: Estimated probability densities and corresponding power-law fits of the seismic moment M of 
shallow earthquakes in the worldwide CMT catalog and of the estimated M in the Southern California 
catalog. 

prising the Southern Indian and the Southwestern Pacific), and the Northern Indian Ocean. The 
abbreviation, time span, and number of events for each basin are: NAtl, 1966-2011, 532; EPac, 
1966-201 1, 728; WPac, 1986-201 1, 690; SHem, 1986-2007 (up to May), 523; NInd, 1986-2007, 
110. The latter case was not studied in any of the previous works. 

The results for a truncated power law maximizing A'^, shown in Table IV and Fig. 7, are in 
agreement with those of Ref. Il23l . In general, exponents are close but above 1, except for the 
Northwestern Pacific, where a ~ 0.96, and for the North Indian, where a is substantially higher 
than one. We consider that this method performs rather well. It would be interesting to test if 
universality can nevertheless hold (the high value for the North Indian Ocean is based in much less 
data than for the rest of basins), or if there is some systematic bias in the value of the exponents 
(the protocols of the NHC and the JTWC are different, and the satellite coverage of each basin is 
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also different). 
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FIG. 7: Estimated probability densities of the PDI of tropical cyclones in 5 ocean basins, together with their 
power-law fits. The values of the densities are multiplied by 1, 10^, lO'^, 10^ and 10^ for clarity sake. 

If a non-truncated power law is fit to the data, the fits turn out to be rather short, with a high ex- 
ponent (up to 6) describing the tail of the distribution (except for the Southern Hemisphere, where 
no such tail is apparent). We do not give any relevance to these results, as other alternatives, as 
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TABLE IV: Results of the truncated power-law fits, maximizing N, for the PDI in the 5 ocean basins with 

tropical-cyclone activity, for different values of pc. 

basin N airn^/s^) bivtr'/s^) b/a a±o pc 
EPac 637 0.1259E-F10 0.7943E-F11 63 1.094± 0.033 0.10 
NAd 417 0.1995E-F10 0.7943E-F11 40 1.168± 0.047 0.10 
SHem 523 0.1259E-F10 0.1259E-F12 100 1.108± 0.034 0.10 
WPac 637 0.5012E-F09 0.1259E-F12 251 0.957± 0.025 0.10 
NInd 102 0.7943E-F09 0.1995E-F12 251 1.520± 0.077 0.10 
EPac 571 0.1995E-F10 0.7943E-F11 40 1.149± 0.039 0.20 
NAd 417 0.1995E-F10 0.7943E-F11 40 1.168± 0.047 0.20 
SHem 523 0.1259E-F10 0.1259E-F12 100 l.lOSib 0.033 0.20 
WPac 637 0.5012E-F09 0.1259E-F12 251 0.957± 0.025 0.20 
NInd 102 0.7943E-F09 0.1259E-F12 158 1.490± 0.077 0.20 
EPac 571 0.1995E-F10 0.7943E-F11 40 1.149± 0.040 0.50 
NAd 417 0.1995E-F10 0.7943E-F11 40 1.168± 0.045 0.50 
SHem 465 0.1995E-F10 0.1259E-I-12 63 1.132± 0.040 0.50 
WPac 637 0.5012E-F09 0.1259E-F12 251 0.957± 0.024 0.50 
NInd 86 0.7943E-F09 0.1259E-F11 16 1.323± 0.139 0.50 

for instance a simple exponential tail, have to be considered [|80l[86ll . Coming back to a truncated 
power law, but maximizing the log-range, the algorithm sometimes fits the power law in the body 
of the distribution (with exponent close to 1) and for some other times the algorithm goes to the 
fast-decaying tail. So the method of maximizing b/a\$. not useful for this data. 

D. Area of forest fires 

The statistics of the size of forest fires was an intense topic of research since the introduction of 
the concept of SOC, at the end of the 1980's, but only from the point of view of cellular- automaton 
models. Real data analysis had to wait several years [|20l[87]| . leading to power-law distributions, 
more or less in agreement with the models. Here we are particularly interested in a dataset from 
Italy, for which a power-law distribution of sizes was ruled out [|88l . Instead, a lognormal tail was 
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TABLE V: Results of the fits for the burned area of the Ntot = 36 748 fires recorded in the ItaUan catalog, 
for different pc- The cases of a non-truncated power law and a truncated power law, maximizing N, are 
shown. 

A^^ a (ha) b (ha) b/a a±o pc 

51 794 oo oo 2.880± 0.276 0.10 

51 794 oo oo 2.880± 0.277 0.20 

51 794 oo oo 2.880± 0.277 0.50 
168 316 7943 25 2.353± 0.117 0.10 
148 316 1259 4 2.075± 0.213 0.20 

51 794 79430 100 2.870± 0.281 0.50 

proposed for the fire size probability density. 

The data considered in Ref. [88], and reanalyzed in this study, comes from the Archivio In- 
cendi Boschivi (AIB) fire catalog compiled by the Italian Corpo Forestale dello Stato [|89l . The 
subcatalog to which we restrict covers all Italy and spans the 5-year period 1998-2002, containing 
36 748 fires. The size of each fire is measured by the burned area A, in hectares, with 1 ha=10'^ 
m^. In this subsection we analyze the case oix = A. 

The results in Table V and Fig. 8 show that a pure (non-truncated) power law is only acceptable 
(in the sense of non-rejectable) for the rightmost part of the tail of the distribution, comprising less 
than one order of magnitude. It is very indicative that only 51 data are in the possible power-law 
tail. Therefore, we disregard this power-law behavior as spurious and expect that other distribu- 
tions can yield a much better fit (not in order of the quality of the fit but regarding the number of 
data it spans). This seems in agreement with other analyses of forest-fire data [|6li53|. If a trun- 
cated power-law is considered, fitted by maximizing the number of data, the results are not clearly 
better, as seen in the table. Moreover, there is considerable variation with the value of pc- So, we 
do not give any relevance to such power-law fits. Finally, the method of maximizing b/a yields the 
same results as for the non-truncated power law (except by the fact that the exponents are slightly 
smaller, not shown). In order to provide some evidence for the better adequacy of the lognormal 
tail in front of the power-law tail for these data, it would be interesting to apply an adaptation of 
the test explained in Ref. [|90l . 
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FIG. 8: Estimated probability density of the area of fires in the Italian catalog, together with the power-law 
fits. In contrast to the previous datasets analyzed, we consider these power-law fits as irrelevant. 

E. Waiting time between earthquakes 

The temporal properties of earthquakes has been a subject relatively little studied (at least in 
comparison with the size of earthquakes). It is true that the Omori law has been known for more 
than 100 years [|9T1 l92ll . and that this is a law extremely important in order to assess the hazard 
of aftershocks after a big event, but the Omori law looks at time properties in a very coarse- 
grained way, as it only provides the number of events in relatively large time windows. Thus, no 
information on the fine time structure of seismicity is provided, at least directly. 

The situation has changed in the last decade, since the seminal study of Bak et al. [|40l . who 
found a unified scaling law for earthquake waiting-time distributions. They took Southern Cali- 
fornia and divided it in different areas, and computed the time between consecutive earthquakes 
for each area. So, if t- denotes the time of occurrence of the /— th earthquake in area j, the corre- 
sponding waiting time T- is 

J - J _ J 

The key point is that all the resulting waiting times were added to the same distribution (and not 
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to a different distribution j for each area). Subsequently, the unified scaling law was found to be 
valid for other regions of the world [|42l . The shape of the resulting probability density corresponds 
to a double power law, one for small waiting times, associated to a mixture of scales of waiting 
times due to the Omori law, and another for large waiting times, due to spatial heterogeneity arising 
from the mixture of different areas with different seismic rates B0l - l42ll93]| . The first exponent was 
found to be close to 1, whereas the second one was about 2.2; the fits were done by means of the 
nonlinear least-squares Marquardt-Levenberg algorithm from gnuplot, applied to the logarithm 
of the log-binned empirical density. Here we apply our more sophisticated method to updated data 
for Southern California seismicity, with x= T. 

We use again the relocated Southern California catalog of Shearer et al. [TTTIITSI . but starting in 
1984 and ending in June 30th, 201 1. This is to avoid some holes in the catalog for preceding years. 
As for earthquake sizes, the occurrence takes place in a rectangle of coordinates (122°W,30°N), 
(1 13°W,37.5°N). This rectangle is divided into equal parts both in the West-East axis and in the 
South-North axis, in such a way that we consider a number of subdivisions of 4 x 4, 8 x 8, 16 x 16, 
and 32 x 32. The waiting times for events of magnitude m > 2 in each of these subdivisions are 
computed as explained above, resulting in a number of data between 103 000 and 104 000 in all 
cases. 

For a non-truncated power law, the results are only coherent with the previous reported ones 
(exponent around 2.2) for the intermediate cases, i.e., 8x8 and 16 x 16, see Table VI and Fig. 9. 
The disagreement for the other cases can easily be explained. For 4x4, the number of resulting 
subdivisions, 16, seems rather small. As mentioned, in Ref. ||93ll the power-law tail was explained 
in terms of a power-law mixture of exponentials; so, with only 16 regions is possible that the 
asymptotic behavior is still not reached. On the other hand, the effect of the finite duration of the 
catalog is visible in the 32 x 32 data. Due to the scaling behavior of the distributions [|4T1 l42l . 
the possible power-law tail in this case is displaced to larger waiting times; but the time span of 
the catalog, about 10^^ s, clearly alters this power law, which starts to bend at about 10^ s. Thus, 
we conclude that a power-law exponent of about a 2.2 or 2.3 indeed exists, provided that the 
number of spatial subdivisions is high enough and the temporal extension of the catalog is large 
enough. 

When a truncated power-law is fitted, using the method of maximizing the number of data A^^, 
the other power law emerges, but for a range shorter than what the plot of the densities suggests. 
The exponent is in a range from 0.95 to 1.04 (except for the 4x4 cases, in which it is a bit 
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smaller). The largest log-range is 100, i.e., two decades. The graphical representation of the 
density seems to indicate that the possible power law is influenced by the effect of two crossovers, 
one for large waiting times, associated to a change in exponent, and another one for smaller times, 
where the distribution becomes flat. Finally, the method of fitting which maximizes the log-range 
leads to results that are similar to the non-truncated power-law case, although sometimes intervals 
corresponding to very small times are selected. The latter results have no physical meaning, as 
correspond to times below 1 s, i.e., below the error in the determination of the occurrence time. 




FIG. 9: Estimated probability densities and corresponding power- law fits for the waiting times of m > 2 in 
the Southern California catalog, for different spatial subdivisions. The values of the density are multiplied 
by factors 1, 10, 100, and 1000, for clarity sake. 
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TABLE VI: Results of the fits with a non-truncated power law and a truncated power law, maximizing N, for 
earthquake waiting times calculated for different subdivisions of Southern California. Different minimum 
values are shown. The total number of data is above 103 000 in any case. 



Subdivisions 




a(s) 


^(s) 


b/a a ±(7 pc 


16 X 16 


542 0.3162E+08 


OO 


oo 2.324± 0.056 0.10 


32x32 


67 0.3162E+09 


oo 


oo 4.404± 0.405 0.10 


4x4 


124 0.5012E+07 


OO 


oo 1.921ib 0.085 0.10 


8x8 


1671 0.3162E+07 


oo 


oo 2.198ib 0.031 0.10 


16 X 16 


542 0.3162E+08 


oo 


oo 2.324± 0.056 0.20 


32x32 


67 0.3162E+09 


oo 


oo 4.404± 0.403 0.20 


4x4 


124 0.5012E+07 


oo 


oo 1.921± 0.085 0.20 


8x8 


1671 0.3162E+07 


oo 


oo 2.198±0.031 0.20 


16 X 16 


24 0.3162E+09 


oo 


oo 4.106± 0.703 0.50 


32x32 


67 0.3162E+09 


oo 


oo 4.404± 0.449 0.50 


4x4 


77 0.7943E+07 


oo 


oo 1.856± 0.098 0.50 


8x8 


322 0.1259E+08 


oo 


oo 2.231 ±0.070 0.50 


16 X 16 


44178 


7943 


0.7943E+06 100 0.956± 0.004 10 


32x32 


43512 


1259 


0.1995E+06 158 1.029± 0.003 10 


4x4 


38765 


1995 


0.5012E+05 


25 0.867± 0.006 10 


8x8 


39851 


316 


0.1995E+05 


63 0.987± 0.004 10 


16 x 16 


39481 


7943 


0.5012E+06 


63 0.950± 0.005 20 


32x32 


39654 


1259 


0.1259E+06 100 1.033± 0.004 20 


4x4 


38765 


1995 


0.5012E+05 


25 0.867± 0.006 20 


8x8 


39851 


316 


0.1995E+05 


63 0.987± 0.004 20 


16x 16 


39481 


7943 


0.5012E+06 


63 0.950± 0.005 50 


32x32 


39654 


1259 


0.1259E+06 100 1.033± 0.004 50 


4x4 


34113 


3162 


0.5012E+05 


16 0.864± 0.007 50 


8x8 


39851 


316 


0.1995E+05 


63 0.987± 0.004 50 
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V. CONCLUSIONS 



For power-law distributions, the fitting and testing the goodness-of-fit is a difficult but very 
relevant problem in complex-systems science, in general, and in geoscience in particular. The 
most critical step is to select, automatically (without introducing any subjective bias), where the 
power-law regime starts and where it ends. We have explained in detail a conceptually simple 
but somewhat laborious procedure in order to overcome some difficulties previously found in the 
method introduced by Clauset et al. Il54ll56l . Our method is summarized in fitting by maximum 
likelihood estimation and testing the goodness of fit by the Kolmogorov-Smirnov statistic, using 
Monte Carlo simulations. Although these steps are in common with the Clauset et al.'s recipe, 
the key difference is in the criterion of selection of the power-law range. Despite the many steps 
of these procedures, ours can be easily implemented, and the resulting algorithms run very fast 
in current computers. We also have explained how to estimate properly the probability density 
of a random variable which has a power law or a fat-tail distribution. This is important to draw 
graphical representations of the results of the fitting (specially in Fig. 5) but it is not required to 
perform neither the fits nor the goodness-of-fit tests. 

The performance of the method is quite good, as checked in synthetic power-law datasets, and 
the results of the analysis of previously reported power laws are very consistent. We confirm a 
very broad power-law tail in the distribution of the half-lives of the radionuclides, with exponent 
a = 1.09, as well as other power-law regimes in the body of the distribution. The results for the 
power-law exponent of the distribution of seismic moments worldwide and in Southern California 
are in agreement with previous estimates, but in addition our method provides a reliable way to 
determining the minimum seismic-moment value for which the Gutenberg-Richter law holds. This 
can be useful to check systematically for the completeness thresholds of seismic catalogs. For the 
energy dissipated by tropical cyclones, measured roughly through the PDI, we confirm the power- 
law behavior in the body of the distribution previously reported, with exponents close to one. We 
also survey new results for the Southern Indian Ocean, but with a higher power-law exponent. In 
contrast, for the case of the area affected by forest fires in an Italian catalog, we obtain power- 
law-distributed behavior only for rather small windows of the burnt area, containing few number 
of data. Finally, for the waiting times between earthquakes in different subdivisions of Southern 
California we conclude that the power-law behavior of the tail is very delicate, affected either by 
a small number of subdivisions, when the size of those is large, or by the finite duration of the 
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record, which introduces a sharp decay of the distribution when their number is high. For the body 
of the distribution another power law is found, but the range is limited by crossovers below and 
above it. Naturally, our method can be directly applied to the overwhelming number of fat-tailed 
distributions reported during the last decades in geoscience. 
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